618 results found.
Written
Corpus,
Language Type:
Monolingual
Languages:
Bulgarian Croatian Czech Danish Dutch English Estonian Finnish French German Greek Hungarian Icelandic Irish Italian Latvian Lithuanian Maltese Polish Portuguese Romanian Slovak Slovenian Spanish Swedish
Availability:
Freely Available
License:
CC-0
Size:
341856530 sentences Production Status:
Newly created-in progress
Use:
Machine Translation, SpeechToSpeech Translation
-
Paper title:ParaCrawl: Web-Scale Acquisition of Parallel Corpora
-
Paper track:Long/Resources and Evaluation
-
Paper status:Accept
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Philipp Koehn | ParaCrawl | /N |
Documentation:
None
Written
Treebank,
Language Type:
Multilingual
Languages:
Chinese English French German Italian Japanese Russian Spanish
Availability:
Freely Available
License:
CreativeCommons
Size:
None Production Status:
Existing-used
Use:
Parsing and Tagging
-
Paper title:Why Overfitting Isn't Always Bad: Retrofitting Cross-Lingual Word Embeddings to Dictionaries
-
Paper track:Short/Machine Learning for NLP
-
Paper status:Accept
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Mozhi Zhang | Universal Dependencies | /N |
Documentation:
None
Written
Evaluation Data,
Language Type:
Multilingual
Languages:
Chinese English French German Italian Japanese Russian Spanish
Availability:
From NIST
License:
Size:
None Production Status:
Existing-used
Use:
Document Classification, Text categorisation
-
Paper title:Why Overfitting Isn't Always Bad: Retrofitting Cross-Lingual Word Embeddings to Dictionaries
-
Paper track:Short/Machine Learning for NLP
-
Paper status:Accept
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Mozhi Zhang | Reuters RCV1/RCV2 Multilingual Corpus | /N |
Documentation:
None
Written
Evaluation Data,
Language Type:
Multilingual
Languages:
Arabic German Turkish
Availability:
Freely Available
License:
Apache License 2.0
Size:
814 sentences Production Status:
Newly created-finished
Use:
Document Classification, Text categorisation
-
Paper title:Multi-Label and Multilingual News Framing Analysis
-
Paper track:Long/NLP Applications
-
Paper status:Accept
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Afra Feyza Akyürek | Multilingual Gun Violence Frame Corpus | /N |
Documentation:
None
Written
Corpus,
Language Type:
Multilingual
Languages:
Arabic Chinese English German Hindi Spanish Vietnamese
Availability:
Freely Available
License:
Size:
50+ GByte Production Status:
Existing-used
Use:
Machine Learning
-
Paper title:MLQA: Evaluating Cross-lingual Extractive Question Answering
-
Paper track:Long/Question Answering
-
Paper status:Accept
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Patrick Lewis | Wikipedia | /N |
Documentation:
None
Written
Annotation Tool,
Language Type:
Monolingual
Languages:
German
Availability:
Freely Available
License:
Size:
None Production Status:
Existing-used
Use:
Corpus Creation/Annotation
-
Paper title:Modeling Word Formation in English–German Neural Machine Translation
-
Paper track:Short/Machine Translation
-
Paper status:Accept
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Marion Weller-Di Marco | Stuttgart MORPhology (SMOR) | /N |
Documentation:
yes (in German).
Written
Evaluation Tool,
Language Type:
Multilingual
Languages:
English French German Hebrew Russian
Availability:
Freely Available
License:
Apache License, Version 2.0
Size:
62 MByte Production Status:
Newly created-in progress
Use:
Syntactic Evaluation (and Evaluation Set Generators)
-
Paper title:Cross-Linguistic Syntactic Evaluation of Word Prediction Models
-
Paper track:Long/Interpretability and Analysis of Models for NLP
-
Paper status:Accept
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Aaron Mueller | CLAMS: Cross-Linguistic Assessment of Models on Syntax | /N |
Documentation:
README.md on Github repository in English
Speech
Corpus,
Language Type:
Multilingual
Languages:
English Finnish German Mandarin Chinese
Availability:
Freely Available
License:
OpenSource
Size:
None Production Status:
Existing-used
Use:
Speech Synthesis
-
Paper title:Cross-lingual Voice Conversion with Disentangled Universal Linguistic Representations
-
Paper track:7.11 Cross-lingual and multilingual aspects in spe/Oral Presentation
-
Paper status:Accept
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Zhenchuan Yang | VCC2020 | /N |
Documentation:
http://www.vc-challenge.org/
Speech/Written
Corpus,
Language Type:
Multilingual
Languages:
Dutch English French German Italian Polish Portuguese Spanish
Availability:
Freely Available
License:
CC BY 4.0
Size:
None Production Status:
Existing-used
Use:
Information Extraction, Information Retrieval
-
Paper title:LeBenchmark: A Reproducible Framework for Assessing Self-Supervised Representation Learning from Speech
-
Paper track:8.1 Feature extraction and low-level feature model/Oral Presentation
-
Paper status:Accept
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Laurent Besacier | Multilingual LibriSpeech (MLS) | /N |
Documentation:
https://arxiv.org/abs/2012.03411, English, public
Speech/Written
Corpus,
Language Type:
Multilingual
Languages:
Catalan Chinese English Esperanto French German Italian Kabyle Kinyarwanda Persian Polish Russian Spanish Welsh
Availability:
Freely Available
License:
Creative Commons license
Size:
8.8k hoursProduction Status:
Existing-used
Use:
Speech Recognition/Understanding
-
Paper title:LeBenchmark: A Reproducible Framework for Assessing Self-Supervised Representation Learning from Speech
-
Paper track:8.1 Feature extraction and low-level feature model/Oral Presentation
-
Paper status:Accept
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Laurent Besacier | Common Voice | /N |
Documentation:
https://arxiv.org/pdf/1912.06670.pdf, English, public




